training observation
JUCAL: Jointly Calibrating Aleatoric and Epistemic Uncertainty in Classification Tasks
Heiss, Jakob, Lambrecht, Sören, Weissteiner, Jakob, Wutte, Hanna, Žurič, Žan, Teichmann, Josef, Yu, Bin
We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncertainty. Among the most popular and widely used calibration methods in classification are temperature scaling (i.e., pool-then-calibrate) and conformal methods. However, the main shortcoming of these calibration methods is that they do not balance the proportion of aleatoric and epistemic uncertainty. Not balancing these uncertainties can severely misrepresent predictive uncertainty, leading to overconfident predictions in some input regions while being underconfident in others. To address this shortcoming, we present a simple but powerful calibration algorithm Joint Uncertainty Calibration (JUCAL) that jointly calibrates aleatoric and epistemic uncertainty. JUCAL jointly calibrates two constants to weight and scale epistemic and aleatoric uncertainties by optimizing the negative log-likelihood (NLL) on the validation/calibration dataset. JUCAL can be applied to any trained ensemble of classifiers (e.g., transformers, CNNs, or tree-based methods), with minimal computational overhead, without requiring access to the models' internal parameters. We experimentally evaluate JUCAL on various text classification tasks, for ensembles of varying sizes and with different ensembling strategies. Our experiments show that JUCAL significantly outperforms SOTA calibration methods across all considered classification tasks, reducing NLL and predictive set size by up to 15% and 20%, respectively. Interestingly, even applying JUCAL to an ensemble of size 5 can outperform temperature-scaled ensembles of size up to 50 in terms of NLL and predictive set size, resulting in up to 10 times smaller inference costs. Thus, we propose JUCAL as a new go-to method for calibrating ensembles in classification.
NeurIPS2021_emergent_group_communication (7).pdf
We generate 128,000 images as agents' observations using python's matplotlib library Hunter [2007] V ariational autoencoder [Kingma and Welling, 2014] is used to encode the observations. Input is flatted 30,720-dimensional vector (32 by 320 by 3). Both encoder and decoder have one hidden layer with the dimension size being 1,024. The output (communication message) is a 10-dimensional vector. ReLU is used as the activation function.
A Unified Optimization Framework for Multiclass Classification with Structured Hyperplane Arrangements
Blanco, Víctor, Kothari, Harshit, Luedtke, James
In this paper, we propose a new mathematical optimization model for multiclass classification based on arrangements of hyperplanes. Our approach preserves the core support vector machine (SVM) paradigm of maximizing class separation while minimizing misclassification errors, and it is computationally more efficient than a previous formulation. We present a kernel-based extension that allows it to construct nonlinear decision boundaries. Furthermore, we show how the framework can naturally incorporate alternative geometric structures, including classification trees, $\ell_p$-SVMs, and models with discrete feature selection. To address large-scale instances, we develop a dynamic clustering matheuristic that leverages the proposed MIP formulation. Extensive computational experiments demonstrate the efficiency of the proposed model and dynamic clustering heuristic, and we report competitive classification performance on both synthetic datasets and real-world benchmarks from the UCI Machine Learning Repository, comparing our method with state-of-the-art implementations available in scikit-learn.
Dual Interpretation of Machine Learning Forecasts
Coulombe, Philippe Goulet, Goebel, Maximilian, Klieber, Karin
Machine learning predictions are typically interpreted as the sum of contributions of predictors. Yet, each out-of-sample prediction can also be expressed as a linear combination of in-sample values of the predicted variable, with weights corresponding to pairwise proximity scores between current and past economic events. While this dual route leads nowhere in some contexts (e.g., large cross-sectional datasets), it provides sparser interpretations in settings with many regressors and little training data-like macroeconomic forecasting. In this case, the sequence of contributions can be visualized as a time series, allowing analysts to explain predictions as quantifiable combinations of historical analogies. Moreover, the weights can be viewed as those of a data portfolio, inspiring new diagnostic measures such as forecast concentration, short position, and turnover. We show how weights can be retrieved seamlessly for (kernel) ridge regression, random forest, boosted trees, and neural networks. Then, we apply these tools to analyze post-pandemic forecasts of inflation, GDP growth, and recession probabilities. In all cases, the approach opens the black box from a new angle and demonstrates how machine learning models leverage history partly repeating itself.
Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation
Chen, Louis L., Chern, Bobbie, Eckstrand, Eric, Mahapatra, Amogh, Royset, Johannes O.
Labeling errors in datasets are common, if not systematic, in practice. They naturally arise in a variety of contexts--human labeling, noisy labeling, and weak labeling (i.e., image classification), for example. This presents a persistent and pervasive stress on machine learning practice. In particular, neural network (NN) architectures can withstand minor amounts of dataset imperfection with traditional countermeasures such as regularization, data augmentation, and batch normalization. However, major dataset imperfections often prove insurmountable. We propose and study the implementation of Rockafellian Relaxation (RR), a new loss reweighting, architecture-independent methodology, for neural network training. Experiments indicate RR can enhance standard neural network methods to achieve robust performance across classification tasks in computer vision and natural language processing (sentiment analysis). We find that RR can mitigate the effects of dataset corruption due to both (heavy) labeling error and/or adversarial perturbation, demonstrating effectiveness across a variety of data domains and machine learning tasks.
Interpretable Machine Learning for TabPFN
Rundel, David, Kobialka, Julius, von Crailsheim, Constantin, Feurer, Matthias, Nagler, Thomas, Rügamer, David
The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN.
Estimation of minimum miscibility pressure (MMP) in impure/pure N2 based enhanced oil recovery process: A comparative study of statistical and machine learning algorithms
Zhu, Xiuli, Damarla, Seshu Kumar, Huang, Biao
Minimum miscibility pressure (MMP) prediction plays an important role in design and operation of nitrogen based enhanced oil recovery processes. In this work, a comparative study of statistical and machine learning methods used for MMP estimation is carried out. Most of the predictive models developed in this study exhibited superior performance over correlation and predictive models reported in literature.
Confidence intervals for the random forest generalization error
How confident can we be in the generalization capacity of a predictive model? Of the many devices discussed in the statistical learning literature [1, 2, 3], a simple random split of the original data into training and test sets, and methods of folded cross-validation, stand out as the most common tools used to tackle the generalization issue. Availability of point estimates for the generalization error given by these procedures naturally raises the question of how to quantify the uncertainty involved in these estimates spending a manageable computational cost. Random forests [4] elegantly provide an alternative low cost (almost free) point estimate of the generalization error without requiring splittings of the data, and avoiding the computational burden of retraining the predictive model several times. The bagging mechanism [5] used to construct the ensemble of trees implies that each training data point is not used (stays "out-of-bag") when growing approximately 36.8% of the trees in the forest. This property gives us the so called out-of-bag estimate of the random forest generalization error: for each observation, using a suitable loss function, we compute the predictive error made by the random subforest whose trees didn't include the observation under consideration in its training process; the out-of-bag estimate is the average of these prediction errors over the whole training sample. 1
MCCE: Monte Carlo sampling of realistic counterfactual explanations
Redelmeier, Annabelle, Jullum, Martin, Aas, Kjersti, Løland, Anders
In this paper we introduce MCCE: Monte Carlo sampling of realistic Counterfactual Explanations, a model-based method that generates counterfactual explanations by producing a set of feasible examples using conditional inference trees. Unlike algorithmic-based counterfactual methods that have to solve complex optimization problems or other model based methods that model the data distribution using heavy machine learning models, MCCE is made up of only two light-weight steps (generation and post-processing). MCCE is also straightforward for the end user to understand and implement, handles any type of predictive model and type of feature, takes into account actionability constraints when generating the counterfactual explanations, and generates as many counterfactual explanations as needed. In this paper we introduce MCCE and give a comprehensive list of performance metrics that can be used to compare counterfactual explanations. We also compare MCCE with a range of state-of-the-art methods and a new baseline method on benchmark data sets. MCCE outperforms all model-based methods and most algorithmic-based methods when also taking into account validity (i.e., a correctly changed prediction) and actionability constraints. Finally, we show that MCCE has the strength of performing almost as well when given just a small subset of the training data.